-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(io-engine): add persistent store transaction API #1791
base: develop
Are you sure you want to change the base?
Conversation
bors try |
tryBuild succeeded: |
526bdf3
to
5fda89d
Compare
self.0 | ||
.txn(Txn::new().when(cmps).and_then(ops_success).or_else(fops)) | ||
.await | ||
.context(TxnErr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get an error on the logs which is clear as to what happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get an error on the logs which is clear as to what happened?
It's difficult to get complete info as to what failed as part of transaction. The succeeded()
validation in transaction response is deterministic info of whether transaction compares succeed or fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what does it look like on the logs with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean from io-engine logging? or api usage. For io-engine it'll be something like below, this is not final yet:
[2025-01-08T12:05:37.051454598+00:00 INFO io_engine::persistent_store:persistent_store.rs:208] Executing transaction for key /openebs.io/mayastor/apis/v0/clusters/761ab476f4f4489b900ded45a464a79d/namespaces/default/volume/1c4e66fd-3b33-4439-b504-d49aba53da26/nexus/2f2455c8-e237-4586-848d-762f26ebb044/info.
[2025-01-08T12:05:37.052938551+00:00 ERROR io_engine::bdev::nexus::nexus_persistence:nexus_persistence.rs:216] Nexus '1c4e66fd-3b33-4439-b504-d49aba53da26' [open]: failed to update persistent store, will shutdown the nexus: failed to save nexus state 1c4e66fd-3b33-4439-b504-d49aba53da26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have some more context here on the failure.
Also I hope we distinguish compare errors from general etcd write failures which should be retried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have some more context here on the failure. Also I hope we distinguish compare errors from general etcd write failures which should be retried?
Yes the general error if received as Err
from txn API has to be dealt with retries by caller. However if a Put TxnOp failure happens after a successful compare, then the etcd handles it harshly on server side:
_, err := executeTxn(ctx, lg, txnWrite, rt, txnPath, txnResp)
if err != nil {
if isWrite {
// CAUTION: When a txn performing write operations starts, we always expect it to be successful.
// If a write failure is seen we SHOULD NOT try to recover the server, but crash with a panic to make the failure explicit.
// Trying to silently recover (e.g by ignoring the failed txn or calling txn.End() early) poses serious risks:
// - violation of transaction atomicity if some write operations have been partially executed
// - data inconsistency across different etcd members if they applied the txn asymmetrically
lg.Panic("unexpected error during txn with writes", zap.Error(err))
} else {
lg.Error("unexpected error during readonly txn", zap.Error(err))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have some more context here on the failure.
Have extended caller side error handling(not in this PR) by printing source
in StateSaveFailed
error to add more context in error, as below.
[2025-01-09T11:29:00.330005232+00:00 INFO io_engine::persistent_store:persistent_store.rs:208] Executing transaction for key /openebs.io/mayastor/apis/v0/clusters/761ab476f4f4489b900ded45a464a79d/namespaces/default/volume/1c4e66fd-3b33-4439-b504-d49aba53da26/nexus/486a4605-1004-409b-b24c-9786c8ae279d/info.
[2025-01-09T11:29:00.331331260+00:00 ERROR io_engine::bdev::nexus::nexus_persistence:nexus_persistence.rs:216] Nexus '1c4e66fd-3b33-4439-b504-d49aba53da26' [open]: failed to update persistent store, will shutdown the nexus: failed to save nexus state 1c4e66fd-3b33-4439-b504-d49aba53da26, Failed to do 'transaction' for key /openebs.io/mayastor/apis/v0/clusters/761ab476f4f4489b900ded45a464a79d/namespaces/default/volume/1c4e66fd-3b33-4439-b504-d49aba53da26/nexus/486a4605-1004-409b-b24c-9786c8ae279d/info. Error io error: Txn CompareOp failed
This change adds API to do Compare-and-Swap operations to the persistent store via io-engine. Signed-off-by: Diwakar Sharma <[email protected]>
This change adds API to do Compare-and-Swap operations to the persistent store via io-engine.